Maybe add a special logo here? Twitch x LSF thing idk

Getting Started

Recently, twitch literature has begun characterizing twitch communities through twitch chat, viewership trends, and content but no known projects have used resources that exist outside of twitch to understand how twitch communities manifest and interact with one another.

One subreddit called LivestreamFail (LSF) is a dedicated subreddit where users share these twitch clips, general twitch news, and twitch drama.LivestreamFail is one way smaller streamers become noticed and is a platform that I can use to compare big and small communities. I’m interested in the ways emotes are used between smaller and larger communities because I believe emote meanings and sentiments are being actively redefined. This analysis will be split into an LSF part and Twitch emote-sentiment part.

This analysis will investigate users, posts, and comments (sentiment and topic) on r/LivestreamFail and then investigate the comment data and emote use from twitch clips that were featured in LSF posts.

Goals

Libraies Used

library(pacman)

p_load(tidyverse,
       tidytext,
       tm,
       lubridate,
       stringr, 
       text2vec,
       jsonlite,
       widyr,
       quanteda,
       visNetwork,
       igraph,
       ggraph,
       DT,
       ggthemes)

Reddit Data: r/LivestreamFail —-

LSF STUFF SHOULD GO HERE.

Twitch Data

Twitch Data: The collection

tl;dr - twich dmca issues, streamer bans, and the app I used prevented me from downloading alot more data.

Reddit data was gathered using python and PRAW (Python Reddit API Wrapper) to gather recent data from r/LivestreamFail (October 2020). This resulted in over 900 reddit posts. This data was then used in R scrape links and document if the clips had chat available for download.

To actually download the twitch chat, I used application by lay295 and zigagrcar on github found here.

The Digital Millennium Copyright Act is affecting twitch in a big way.

Twitch is currenly in hot water with DMCA claims, and they are banning streamers for repeated streaming “copyrighted” songs. One method streamers use to combat this is by deleting their content shortly after it was broadcasted. This affected data collection since the collected twitch clips were being actively taken down.

This led to the collection of twitch chat from 227 links present in from the reddit posts.

Emote Data

R and Rselenium was used to scrape the emote data from FrankerFaceZ and BettertwitchTV. Roughly the Top 300 emotes used from each site was collected (emote name and link to image).


what chat looks like

what chat looks like


Looking at twitch emotes

bttv emotes need to be updated, also not sure if gif emotes work.

# https://i.stack.imgur.com/kLMaS.jpg

test<-emote_data %>% mutate("emote_image" = paste("<img src=", emote_link, sep = "")) %>% 
  mutate(emote_image = paste0(emote_image,' height="52"></img>',sep = "")) %>%  select(emote_name,emote_image)

datatable(test, escape = FALSE)

Descriptions of twitch chat

This chart shows us how many unique chat lines there are per streamer. This metric is useful for understanding which streamers may be getting the most attention during a point in time on LSF. Thought this metric should later be controlled for clip length, since longer clips offer more opprotunity for chat engagement.

data %>% group_by(streamer) %>%  count(sort = T)%>%
  head(n=10) %>% 
  ggplot(aes(x = reorder(streamer,-n), y = n))+
  geom_col()+
  theme_wsj(base_size = 12, color = "green")+
  theme(axis.text.x = element_text(size = 12, angle = 15,vjust = .55))+
  labs(title = "Which streamer has the most chats?")


This visualization show us the most active twitch chatters in our dataset. In a larger dataset, finding those high-interactcion chatters maybe useful for drawing links between communities or even creating a contributer badges on twith (like the founders badge).


# This creates a !%in% kind of deal
`%notin%` <- Negate(`%in%`)

data %>% group_by(user) %>% filter(user %notin% c("StreamElements","Streamlabs","Nightbot")) %>%  count(sort = T) %>%
  head(n=10)%>% 
  ggplot(aes(x = reorder(user,-n), y = n))+
  geom_col()+
  theme_wsj(base_size = 12, color = "green")+
  theme(axis.text.x = element_text(size = 8, angle = 15,vjust = .55),
        plot.title = element_text(size = 20))+
  labs(title = "Which user has the most chats?")

# Streamelements and streamlabs are bots. 

Out of the 100 clips, which users can be seen in multiple high scoring clips. Or users seen in multiple chats.

REVISIT THIS ONE

data %>% group_by(streamer,user) %>% count(sort = T) %>% head(n = 10)
## # A tibble: 10 x 3
## # Groups:   streamer, user [10]
##    streamer      user                   n
##    <chr>         <chr>              <int>
##  1 ZeratoR       Crackmort             74
##  2 EsfandTV      Eltefan               54
##  3 Trainwreckstv Homie_from_compton    46
##  4 Mizkif        Gekon                 42
##  5 EsfandTV      WaterLaws             39
##  6 Trainwreckstv LeeqoX                36
##  7 EsfandTV      newmanji              32
##  8 EsfandTV      Nevarixxx             31
##  9 EsfandTV      IcePal                30
## 10 Trainwreckstv juniorrr              30


This plot will give us further insight in the the demographics of the communities of top 5 streamers. This shows the number accounts created by year for each member of the chat by streamer. As an example, one conclusion that may be drawn is that streamers forsen and Mizkif are not attracting new accounts (New user/ban evaders) to their channels. Another conclusion that may be drawn is that Trainwreckstv in 2018, attracted alot of new users, and perhaps played a significant role in bringing new users to twitch. I should investigate futher to understand what happened with train in 2018. This was perhaps his drama year with MitchJones (A popular WOW streamer) or The Speech.


top_5_streamers <- data %>% group_by(streamer) %>% count(sort = T) %>% head(n=5) %>% distinct(streamer)


data %>% filter(streamer %in% top_5_streamers$streamer) %>% mutate(date_year = year(as.Date.character(date))) %>% group_by(date_year,streamer) %>% count(sort = T)%>%
  ggplot(aes(x = date_year, y = n, color = streamer)) + 
  geom_line(size = 2)+
  theme_wsj(base_size = 12, color = "green")+
  labs(title = "Streamer Communities: Account Creation Dates", subtitle = "Top 5 Streamers")+
  theme(plot.title = element_text(size = 15),plot.subtitle = element_text(size= 8),legend.title = element_blank(),legend.position = "bottom")


Tokens, bigrams and trigrams can give us insign into popular emotes/words and spams that occur in these chats.


tokens <- data %>%
  unnest_tokens(word,body)%>% 
  filter(str_detect(word,"^[:alpha:]"))

tokens %>% glimpse(width = 50)
## Rows: 159,517
## Columns: 4
## $ user     <chr> "Humorous_Chimp", "mayodongs...
## $ date     <dttm> 2013-03-29 16:35:36, 2019-1...
## $ streamer <chr> "Jerma985", "Jerma985", "Jer...
## $ word     <chr> "omegalul", "out", "of", "mo...
tokens %>% group_by(word) %>% count(sort = T)%>%
  head(n=10) %>% 
  ggplot(aes(x= reorder(word,-n),y=n))+
  geom_col()+
  theme_wsj(base_size = 12, color = "green")+
  theme(axis.text.x = element_text(angle = 25))+
  labs(title ="Token Counts")

data %>%
  unnest_tokens(bigram,body,token = 'ngrams',n = 2)%>% 
  filter(str_detect(bigram,"^[:alpha:]")) %>% 
  group_by(bigram) %>% count(sort = T)%>%
  head(n=10) %>% 
  ggplot(aes(x= reorder(bigram,-n),y=n))+
  geom_col()+
  theme_wsj(base_size = 12, color = "green")+
  theme(axis.text.x = element_text(angle = 25,size = 9))+
  labs(title ="Bigram Counts")

data %>%
  unnest_tokens(trigram,body,token = 'ngrams',n = 3)%>% 
  filter(str_detect(trigram,"^[:alpha:]")) %>% 
  group_by(trigram) %>% count(sort = T)%>%
  head(n=10) %>% 
  ggplot(aes(x= reorder(trigram,-n),y=n))+
  geom_col()+
  theme_wsj(base_size = 12, color = "green")+
  theme(axis.text.x = element_text(angle = 25,size = 9))+
  labs(title ="trigram Counts")